Move away from simple strat for SM keyspace #4727

Michal-Leszczynski · 2025-12-29T23:08:10Z

Alternative approach to #4555.

While the other PR focused on keeping SimpleStrategy and working around its limitations, this PR goes into direction of using NetworkTopologyStrategy instead.

The reasoning for that is:

this change targets just the default SM keyspace - user can create the keyspace manually if needed
for default single node SM DB those strategies are equivalent
for single DC SM DB they are also equivalent
for multi DC SM DB we already recommend user to create SM keyspace manually with desired replication
SimpleStrategy is not recommended for production in general
if we keep SimpleStrategy, we might end up with a similar issue in some time

More details in the commit messages.

Merged commit messages

SimpleStrategy is not recommended for the production environments.
It also does not support tablets, which are the way towards scylla
is moving (in some distant time, vnodes might even be deprecated).
Because of that, it also runs into unexpected problems like
#4555.

Because of those reasons, we should switch from keeping SM data in
SimpleStrategy to NetworkTopologyStrategy keyspace by default.

For the default, single local node SM DB cluster, both strategies
result in the single node containing all of SM data. For single DC
SM DB cluster they are also equivalent. On the other hand, for multi
DC SM DB cluster (really not recommended...), NetworkTopologyStrategy
results in keeping replication_factor replicas per DC and not per cluster.
This is theoretically problematic, but such setup is also problematic for
SimpleStrategy, which would still replicate the data in all DCs, so it's
not a new problem. Moreover, this change only defines the default SM
keyspace creation. In case user has some non-default (perhaps not recommended)
SM DB setup, they can always create SM keyspace manually before starting
SM server to configure it to their liking.

Because of #4555, but also because
it's better to test recommended keyspace replication strategy, we need to replace
SimpleStrategy with NetworkTopologyStrategy in tests wherever possible.
In case test is aiming to run against SimpleStrategy specifically, it needs to be
adjusted so that it ensures that keyspace with SimpleStrategy does not use tablets.

SimpleStrategy is not recommended for the production environments. It also does not support tablets, which are the way towards scylla is moving (in some distant time, vnodes might even be deprecated). Because of that, it also runs into unexpected problems like #4555. Because of those reasons, we should switch from keeping SM data in SimpleStrategy to NetworkTopologyStrategy keyspace by default. For the default, single local node SM DB cluster, both strategies result in the single node containing all of SM data. For single DC SM DB cluster they are also equivalent. On the other hand, for multi DC SM DB cluster (really not recommended...), NetworkTopologyStrategy results in keeping replication_factor replicas per DC and not per cluster. This is theoretically problematic, but such setup is also problematic for SimpleStrategy, which would still replicate the data in all DCs, so it's not a new problem. Moreover, this change only defines the default SM keyspace creation. In case user has some non-default (perhaps not recommended) SM DB setup, they can always create SM keyspace manually before starting SM server to configure it to their liking. Refs #4555

…n tests Because of #4555, but also because it's better to test recommended keyspace replication strategy, we need to replace SimpleStrategy with NetworkTopologyStrategy in tests wherever possible. In case test is aiming to run against SimpleStrategy specifically, it needs to be adjusted so that it ensures that keyspace with SimpleStrategy does not use tablets. Refs #4555

VAveryanov8

Seems reasonable to me!

I'm curios, how how will it behave with replace manager procedure in siren, e.g. when the scylla-manager schema is restored from the self-backup?

Michal-Leszczynski · 2026-01-22T11:16:01Z

I'm curios, how how will it behave with replace manager procedure in siren, e.g. when the scylla-manager schema is restored from the self-backup?

So this PR only changes how SM keyspace is created from scratch. In terms of siren self-backup, if SM keyspace in the backup was using SimpleStrategy, during the self-restore procedure SM would create new, tmp SM keyspace with NetworkTopologyStrategy from scratch, restore SM keyspace from the backup and switch back to using the one from the backup.

Michal-Leszczynski · 2026-01-22T11:17:00Z

@paszkow does the approach presented in this PR make sense to you?

paszkow

Moving to a NetworkTopologyStrategy is definitely better, but it raises a few questions:

How big is the keyspace in production? We might want to tweak tablet options for it.
When is it created?
With the rf_rack_valid_keyspaces option, it's yet another non-user keyspace that a user will have to alter while decommissioning a rack. We need to keep that in mind.

\cc @bhalevy

paszkow · 2026-01-23T06:52:51Z

pkg/cmd/scylla-manager/db.go

 }

-const createKeyspaceStmt = "CREATE KEYSPACE {{.Keyspace}} WITH replication = {'class': 'SimpleStrategy', 'replication_factor': {{.ReplicationFactor}}}"
+const createKeyspaceStmt = "CREATE KEYSPACE {{.Keyspace}} WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': {{.ReplicationFactor}}}"


Do you know how big this table is in production? When tablets are enabled, it will create 10 tablets/shard by default. Maybe we should start small and let a load balancer split it when needed. \cc @bhalevy

See this comment for more context.

paszkow · 2026-01-23T07:05:10Z

dist/etc/scylla-manager.yaml

 #  local_dc:
 #
 # Keyspace for management data.
+# It will be created automatically on ScyllaDB Manager server startup if it does not already exist.


Is the manager server created once a cluster is fully constructed? By default RF=3 is used. That requires 3 nodes/racks to already exist. Otherwise, the keyspace will not be created.

See this comment for more context.

Michal-Leszczynski · 2026-01-23T09:20:22Z

Let me put some more context here.

In terms of SM backend specification, in theory, it's possible to use any scylla cluster as SM backend, but the default orchestration (also the one used in siren and operator) is that we use a single node scylla cluster located on SM VM. This cluster runs on a single shard and is limited to 500M memory. It also runs in developer mode. This should demonstrate that we expect small amount of data to be kept there - only SM task definitions and their execution history/progress.
So this cluster has really limited resources and is separate from the user cluster.

Having said that, we've seen some custom, manual deployments where SM cluster and user cluster are the same ones, but in such case, it's still possible to create SM keyspace manually to specify desired replication and such.

In terms of this keyspace creation, when SM server is started, it checks if SM keyspace exists. If it does, it uses it for storing its data. If it doesn't exist, it creates it (previously with simple strategy, after this PR with network topology strategy). So the changes made in this PR won't affect existing deployments, only the new ones. Moreover, IIUC, moving from simple strategy to network topology strategy for a single node cluster shouldn't have a lot of impact. The same goes for initial tablet count, as all data is stored on a single node anyway.

In terms of rf_rack_valid_keyspaces, SM does not use any advanced features like views, LWT, CDC, etc. It simply creates tables (with some UDTs) and inserts/selects/updates them so I would assume that we don't need it here. Tests for this PR were executed with 2025.4 as SM backed (without rf_rack_valid_keyspaces) and all looked good.

@paszkow does this answers your concerns (or perhaps it created more of them)?

Michal-Leszczynski marked this pull request as ready for review December 30, 2025 08:56

Michal-Leszczynski requested review from VAveryanov8 and karol-kokoszka as code owners December 30, 2025 08:56

Michal-Leszczynski requested a review from paszkow December 30, 2025 08:57

Michal-Leszczynski mentioned this pull request Dec 30, 2025

Explicitly disable tablets for SimpleStrategy replication #4555

Open

Michal-Leszczynski added 4 commits December 30, 2025 10:52

refactor(docs): install, replace Scylla with ScyllaDB

19e3b21

feat(docs): explain that SM ks uses NetworkTopologyStrategy by default

9dba132

Michal-Leszczynski force-pushed the ml/4555-move-away-from-simple-strat branch from ba6360e to 2e0a379 Compare December 30, 2025 09:53

Michal-Leszczynski changed the title ~~Ml/4555 move away from simple strat~~ Move away from simple strat for SM keyspace Jan 7, 2026

Michal-Leszczynski mentioned this pull request Jan 20, 2026

Support GCP in native backup/restore #4588

Open

VAveryanov8 approved these changes Jan 22, 2026

View reviewed changes

paszkow reviewed Jan 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move away from simple strat for SM keyspace #4727

Move away from simple strat for SM keyspace #4727

Michal-Leszczynski commented Dec 29, 2025 •

edited

Loading

Uh oh!

VAveryanov8 left a comment

Uh oh!

Michal-Leszczynski commented Jan 22, 2026

Uh oh!

Michal-Leszczynski commented Jan 22, 2026

Uh oh!

paszkow left a comment

Uh oh!

paszkow Jan 23, 2026

Uh oh!

Michal-Leszczynski Jan 23, 2026

Uh oh!

paszkow Jan 23, 2026

Uh oh!

Michal-Leszczynski Jan 23, 2026

Uh oh!

Michal-Leszczynski commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Move away from simple strat for SM keyspace #4727

Are you sure you want to change the base?

Move away from simple strat for SM keyspace #4727

Conversation

Michal-Leszczynski commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VAveryanov8 left a comment

Choose a reason for hiding this comment

Uh oh!

Michal-Leszczynski commented Jan 22, 2026

Uh oh!

Michal-Leszczynski commented Jan 22, 2026

Uh oh!

paszkow left a comment

Choose a reason for hiding this comment

Uh oh!

paszkow Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Michal-Leszczynski Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

paszkow Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Michal-Leszczynski Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Michal-Leszczynski commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Michal-Leszczynski commented Dec 29, 2025 •

edited

Loading